Method and apparatus for encoding and/or decoding bit depth scalable video data using adaptive enhancement layer prediction

ABSTRACT

A scalable video bitstream may have an H.264/AVC compatible base layer (BL) and a scalable enhancement layer (EL), where scalability refers to color bit depth. The SVC standard allows spatial inter-layer prediction, wherein a residual in the EL is generated which is then intra coded. Another spatial intra-coding mode for EL is pure intra coding (I_N×N). The invention discloses a new intra-coding mode and two new inter coding modes, particularly for bit depth scalability. The new intra coding mode uses encoding of the residual between upsampled reconstructed BL and original EL, using mode selection. Two possible modes are residual prediction from BL and additional intra-coding of this residual. The new inter coding modes use also prediction of EL from reconstructed BL. In a first inter coding mode, the residual is encoded using Motion Estimation based on this residual. In a second inter coding mode, the residual is encoded using upsampled motion information from the BL.

FIELD OF THE INVENTION

The invention relates to the technical field of digital video coding. Itpresents a coding solution for a novel type of scalability: bit depthscalability.

BACKGROUND

The video coding standard H.264/AVC provides various video coding modesand dynamic selection between them according to rate-distortionoptimization (RDO). Its extension for Scalable Video Coding (SVC)provides different layers and supports for spatial scalability eitherdirect encoding of the enhancement layer (EL), or inter-layerprediction. In direct encoding of the EL, a mode called I_N×N,redundancy between layers is not used: the EL is purely intra coded.

Inter-layer prediction is used in two coding modes, namely I_BL if thebase layer (BL) is intra-coded, and residual prediction if the BL isinter-coded, so that EL and EL residuals are generated. With residualprediction, an EL residual is predicted from the EL residual.

For intra-coded EL macroblocks (MBs), the SVC supports two types ofcoding modes, namely original H.264/AVC I_N×N coding (spatialprediction, base_mode_flag=0) and I_BL, a special SVC coding mode forscalability where an EL MB is predicted from a collocated BL MB.

For inter-coding, the first step is generating BL and EL differentialimages called residuals. Residual inter-layer prediction is done forencoding the difference between the BL residual and the EL residual.

In recent years, higher color depth than the conventional eight bitcolor depth is more and more desirable in many fields, such asscientific imaging, digital cinema, high-quality-video-enabled computergames and professional studio and home theatre related applications.Accordingly, the state-of-the-art video coding standard H.264/AVC hasincluded Fidelity Range Extensions (FRExt), which support up to 14 bitsper sample and up to 4:4:4 chroma sampling.

For a scenario with two different decoders, or clients with differentrequests for the bit depth, e.g. 8 bit and 12 bit for the same rawvideo, the existing H.264/AVC solution is to encode the 12-bit raw videoto generate a first bit-stream, and then convert the 12-bit raw video toan 8-bit raw video and encode it to generate a second bitstream. If thevideo shall be delivered to different clients who request different bitdepths, it has to be delivered twice, e.g. the two bitstreams are put inone disk together. This is of low efficiency regarding both thecompression ratio and the operational complexity.

The European Patent application EP06291041 discloses a scalable solutionto encode the whole 12-bit raw video once to generate one bitstream thatcontains an H.264/AVC compatible BL and a scalable EL. Due to redundancyreduction, the overhead of the whole scalable bitstream on theabove-mentioned first bitstream is small compared to the additionalsecond bitstream. If an H.264/AVC decoder is available at the receivingend, only the BL sub-bitstream is decoded, and the decoded 8-bit videocan be viewed on a conventional 8-bit display device; if a bit depthscalable decoder is available at the receiving end, both the BL and theEL sub-bitstreams may be decoded to obtain the 12-bit video, and it canbe viewed on a high quality display device that supports color depths ofmore than eight bit.

SUMMARY OF THE INVENTION

The above-mentioned possibilities for redundancy reduction are not veryflexible, considering that the efficiency of a particular encoding modedepends on the contents of the image. Different encoding modes may beoptimized for different sequences. The efficiency of an encoding mode ishigher if more redundancy can be reduced and the resulting bit-stream issmaller. The present invention provides a solution for this problem inthe context of bit depth scalability.

Claim 1 discloses a method for encoding scalable video data that allowsimproved redundancy reduction and dynamic adaptive selection of the mostefficient encoding mode. Claim 5 discloses a corresponding decodingmethod.

A corresponding apparatus for encoding is disclosed in claim 8, and acorresponding apparatus for decoding is disclosed in claim 9.

Three new SVC compatible coding modes of EL for CBDS are disclosed: onefor intra coding and two for inter coding. It has been found that codingthe inter-layer residual directly is more effective for bit depthscalable coding.

The new intra coding mode uses encoding of the residual betweenupsampled reconstructed BL and original EL (EL_(org)-BL_(rec,up)),wherein mode selection is used. In principle, the inter-layer residualis treated as N-bit video to replace the original N-bit EL video. Twopossible modes are

-   -   1. a residual predicted from BL is just transformed, quantized        and entropy coded, and    -   2. this residual is additionally intra-coded (I_N×N).        Conventionally, the best mode for Intra MB was selected between        I_BL mode and I_N×N mode of original EL N-bit video, using RDO.        With the presented new Intra mode, the Intra MB best mode is        selected between I_BL mode and I_N×N of N-bit inter-layer        residual.

The new inter coding modes use prediction of EL from upsampledreconstructed BL (like the new intra mode) instead of the BL residual.Two possible inter coding modes (switched by a flag) are

-   -   1. the residual (EL_(org)-BL_(rec,up)) is encoded using Motion        Estimation based on this residual; and    -   2. the residual (EL_(org)-BL_(rec,up)) is encoded using motion        information from the BL, thereby omitting Motion Estimation on        the EL.

According to the invention, reconstructed BL information units (insteadof original BL information units or BL residuals) are upsampled usingbit depth upsampling, and the upsampled reconstructed BL informationunits are used to predict the collocated EL information units. This hasthe advantage that the prediction in the encoder is based on the samedata that are available at the decoder. Thus, the differentialinformation or residual that is generated in the encoder matches betterthe difference between the bit-depth upsampled decoded BL image at thedecoder and the original EL image, and therefore the reconstructed ELimage at the decoder comes closer to the original EL image.

Information units may be of any granularity, e.g. units of singlepixels, pixel blocks, MBs or groups thereof. Bit depth upsampling is aprocess that increases the number of values that each pixel can have.The value corresponds usually to the color intensity of the pixel. Thus,fine tuned color reproduction possibilities are enhanced, and gradualcolor differences of the original scene can be better encoded anddecoded for being reproduced. Advantageously the video data rate can bereduced compared to current encoding methods.

An encoder generates a residual from the original EL video data and bitdepth upsampled reconstructed EL data, and the residual is entropyencoded and transmitted. The reconstructed BL information is upsampledat the encoder side and in the same manner at the decoder side, whereinthe upsampling refers at least to bit depth.

Further, the upsampling can be performed for intra coded as well as forinter coded images or MBs. However, different modes can be used forintra and inter coded images. Other than Intra coded images or I-frames,Inter coded images, also called P- or B-frames, need for theirreconstruction other images, i.e. images with other picture order count(POC).

According to one aspect of the invention, an encoder can select betweenat least two different intra coding modes for the EL: a first intracoding mode comprises generating a residual between the upsampledreconstructed BL and the original EL, and a second intra coding modeadditionally comprises intra coding of this residual. In principle, theinter-layer residual is treated as higher bit depth video in the ELbranch, replacing the conventional higher bit depth video. The residualor its intra coded version is then transformed, quantized and entropycoded. Conventionally, the best mode for intra MBs is selected betweenI_BL mode and I_N×N mode of original EL video, using RDO. With thedisclosed new intra mode, the best intra MB mode is selected betweenI_BL mode and I_N×N of the high bit depth inter-layer residual, usingRDO.

According to another aspect of the invention, the encoder can employ anInter coding mode that comprises generating a residual between the bitdepth upsampled reconstructed BL and the original EL. Further, theencoder may select for the EL between motion vectors that are upsampledfrom the BL and motion vectors that are generated based on said residualbetween the upsampled reconstructed BL and the original EL. Selectionmay be based on RDO of the encoded EL data.

According to one aspect of the invention, a method for encoding videodata having a BL and an EL, wherein pixels of the BL have less bit depththan pixels of the enhancement layer, comprises steps of

-   -   transforming and quantizing BL data,    -   inverse transforming and inverse quantizing the transformed and        quantized BL data, wherein reconstructed BL data are obtained,    -   upsampling the reconstructed BL data, wherein the upsampling        refers at least to bit depth and wherein a predicted version of        EL data is obtained,    -   generating a residual between original EL data and the predicted        version of EL data,    -   selecting for the case of inter coded EL between at least two        different inter coding modes, wherein a first inter coding mode        comprises using upsampled BL motion information and a second        inter coding mode comprises using motion information generated        from said EL data,    -   encoding the transformed and quantized BL data, and    -   encoding said EL residual using the selected EL encoding mode        and an indication indicating said mode to a decoder.

According to one aspect of the invention, the method for encodingfurther comprises the step of selecting for the case of intra coded ELdata between at least two different intra coding modes, wherein at leastone but not all of the intra coding modes comprises additional intracoding of said residual between original EL data and the predictedversion of EL data.

Advantageously, the two mentioned encoder embodiments can be combinedinto a combined encoder that can adaptively encode intra- andinter-encoded video data, using means for detecting whether encodedvideo data are Inter or Intra coded (e.g. according to an indication).

According to one aspect of the invention, a method for decoding scalablevideo data having a BL and an EL, wherein pixels of the BL have less bitdepth than pixels of the enhancement layer, comprises the steps of

-   -   receiving quantized and (e.g. DCT-) transformed enhancement        layer information and base layer information and a decoding mode        indication,    -   performing inverse quantization and inverse transformation on        the received EL and BL information,    -   upsampling inverse quantized and inverse transformed BL        information, wherein the bit depth per value is increased and        wherein predicted EL information is obtained, and    -   reconstructing from the predicted EL information and the inverse        quantized and inverse transformed EL information reconstructed        EL video information, wherein a decoding mode according to said        decoding mode indication is selected,    -   wherein possible decoding modes comprise    -   a first mode, wherein in the case of inter coded EL information        the inverse quantized and inverse transformed EL information is        decoded using motion information extracted from the EL        information, and    -   a second mode, wherein in the case of inter coded EL information        the inverse quantized and inverse transformed EL information is        decoded using motion information extracted from the BL        information.

According to one aspect of the invention, the method for decoding isfurther specified in that possible decoding modes further comprise

-   -   a third mode, wherein in the case of intra coded EL information        the inverse quantized and inverse transformed EL information        results in an EL residual, and    -   a fourth mode, wherein in the case of intra coded EL information        the inverse quantized and inverse transformed EL information is        intra decoded (using I_N×N decoding) to obtain said EL residual.

Advantageously, the two mentioned decoder embodiments can be combinedinto a combined decoder that can adaptively decode intra- andinter-encoded video data.

According to another aspect of the invention, an encoded scalable videosignal comprises encoded EL data, encoded EL data and a prediction typeindication, wherein the encoded EL data comprises a residual being thedifference between a bit depth upsampled BL image and an EL image, theresidual comprising differential bit depth information, and wherein theprediction type indication indicates whether or not the decoder mustperform spatial intra decoding on the EL data to re-obtain the residualthat refers to said bit depth upsampled BL image.

According to another aspect of the invention, an apparatus for encodingvideo data having a base layer and an enhancement layer, wherein thebase layer has lower color resolution and lower spatial resolution thanthe enhancement layer, comprises means for transforming and means forquantizing base layer data,

-   -   means for inverse transforming and means for inverse quantizing        the transformed and quantized base layer data, wherein        reconstructed base layer data are obtained,    -   means for upsampling the reconstructed base layer data, wherein        the upsampling refers at least to bit depth and wherein a        predicted version of enhancement layer data is obtained,    -   means for generating a residual between original enhancement        layer data and the predicted version of enhancement layer data,    -   means for selecting for the case of inter coded enhancement        layer between at least two different inter coding modes,    -   wherein a first inter coding mode comprises using upsampled base        layer motion information and a second inter coding mode        comprises using motion information generated from said        enhancement layer data,    -   means for encoding the transformed and quantized base layer        data, and means for encoding said enhancement layer residual        using the selected enhancement layer encoding mode.

According to another aspect of the invention, an apparatus for decodingvideo data having a BL and an EL, wherein the BL has lower colorresolution and lower spatial resolution than the EL, comprises means fortransforming and means for quantizing BL data, means for inversetransforming and means for inverse quantizing the transformed andquantized BL data, wherein reconstructed BL data are obtained, means forupsampling the reconstructed BL data, wherein the upsampling refers atleast to bit depth and wherein a predicted version of EL data isobtained, means for generating a residual between original EL data andthe predicted version of EL data, means for selecting for the case ofinter coded EL between at least two different inter coding modes,wherein a first inter coding mode comprises using upsampled BL motioninformation and a second inter coding mode comprises using motioninformation generated from said EL data, means for encoding thetransformed and quantized BL data, and means for encoding said ELresidual using the selected EL encoding mode.

Various embodiments of the presented coding solution are compatible toH.264/AVC and all kinds of scalability that are currently defined inH.264/AVC scalable extension (SVC).

Advantageous embodiments of the invention are disclosed in the dependentclaims, the following description and the figures.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the invention are described with reference tothe accompanying drawings, which show in

FIG. 1 a framework of color bit depth scalable coding;

FIG. 2 an encoder framework of a new Intra coding mode for bit depthscalable enhancement layer;

FIG. 3 an encoder framework of two new Inter coding modes for bit depthscalable enhancement layer;

FIG. 4 a decoder framework of two new Inter coding modes for bit depthscalable enhancement layer; and

FIG. 5 a decoder framework of the new Intra coding mode for bit depthscalable enhancement layer.

DETAILED DESCRIPTION OF THE INVENTION

As shown in FIG. 1, two videos are used as input to the video encoder:N-bit raw video and M-bit (M<N, usually M=8) video. The M-bit video canbe either decomposed from the N-bit raw video or given by other ways.The scalable solution can reduce the redundancy between two layers byusing pictures of the BL. The two video streams, one with 8-bit colorand the other with N-bit color (N>8), are input to the encoder, and theoutput is a scalable bit-stream. It is also possible that only one N-bitcolor data stream is input, from which an M-bit (M<N) color data streamis internally generated for the BL. The M-bit video is encoded as the BLusing the included H.264/AVC encoder. The information of the BL can beused to improve the coding efficiency of the EL. This is calledinter-layer prediction herein. Each picture—a group of MBs—has twoaccess units, one for the BL and the other one for the EL. The codedbitstreams are multiplexed to form a scalable bitstream. The BL encodercomprises e.g. an H.264/AVC encoder, and the reconstruction is used topredict the N-bit color video, which will be used for the EL encoding.

As shown in FIG. 1, the scalable bit-stream exemplarily contains an AVCcompliant BL bit-stream, which can be decoded by a BL decoder(conventional AVC decoder). Then the same prediction as in the encoderwill be done at the decoder side (after evaluation of a respectiveindication) to get the predicted N-bit video. With the N-bit predictedvideo, the EL decoder will then use the N-bit prediction to generate thefinal N-bit video for a High Quality display HQ.

In the following, when the term color bit depth is used it means bitdepth, i.e. the number of bits per value. This is usually correspondingto color intensity.

In one embodiment, the present invention is based on the currentstructure of SVC spatial, temporal and quality scalability, and isenhanced by bit depth scalability for enhanced color bit depth. Hence,this embodiment is completely compatible to the current SVC standard.However, it will be easy for the skilled person to adapt it to otherstandards.

In one embodiment of the invention three new types of encoding mode canbe used, which are all based on bit depth prediction for bit depthscalability. These new coding modes were designed to solve the problemof how to more efficiently and more flexibly encode the inter-layerresidual. Current SVC standard only supports encoding the inter-layerresidual at IBL mode, without any prediction mode selection. For Intercoding, the current SVC standard does not support directly encoding theinter-layer residual. Instead, residual inter-layer prediction is donefor encoding the difference between the BL residual and the EL residual.In other words, the input to the inter-layer prediction module is theresidual of BL in Inter coding, but not the reconstructed BL that isused herein. From the disclosed three new coding modes, one refers toIntra coding and the other two to Inter coding, for encoding theinter-layer residual based on H.264/AVC.

Intra Coding Mode

The current SVC standard supports two types of coding modes forenhancement layer Intra MB, one is original H.264/AVC I_N×N coding mode,and the other one is an SVC special coding mode I_BL. In current SVC,I_N×N mode encodes the original EL N-bits video while I_BL mode codesthe inter-layer residual directly without prediction mode selection. Thepresent invention adds a new mode for coding Intra MBs, by treating theinter-layer residual as N-bit video and replacing the original N-bitvideo with the inter-layer residual. With the presented new Intra mode,the Intra MB best mode is selected between I_BL mode and I_N×N encodedversion of the N-bit inter-layer residual. A framework of Intra codingfor a color bit depth scalable codec with this Intra coding mode isshown in FIG. 2.

Depending on a mode selection switch MSS, the EL residual is or is notI_N×N encoded before it is transformed T, quantized Q and entropy codedEC_(EL). The encoder has means for deciding the encoding mode based onRDO, which provides a control signal EL_intra_flag that is also outputfor correspondingly controlling the decoder. For this purpose the meansfor deciding can actually perform the encoding, or only analyze theinput image data according to defined parameter, e.g. color or texturesmoothness.

A corresponding decoder is shown in FIG. 5. It detects in its input dataan indication EL_intra_flag, and in response to the indication sets MCC′the corresponding decoding mode in its EL branch. For one value of theindication EL_intra_flag the inverse quantized and inverse transformedEL residual EL′_(res) will be used as it is for decoding, while foranother value of the indication EL_intra_flag spatial prediction I_N×Nwill be performed before. The indication can be contained e.g. in sliceheader information and be valid for a complete slice.

Inter Coding Mode

For Inter coding, the current SVC standard does not support theinter-layer prediction using the reconstructed base layer picture, butsupports the inter-layer prediction based on the base layer residual,that is the difference between the original BL M-bit video and thereconstructed M-bit counterpart generated by the BL encoder. Byutilizing the new Inter coding mode for the EL, the inter-layerprediction is done using the reconstructed and upsampled M-bit BLinformation Pre_(c){BL_(rec)}, as shown in FIG. 3. In the EL branch ofthe encoder, this inter-layer residual is encoded using one of the atleast two encoding modes.

The first new EL Inter coding mode comprises encoding the inter-layerresidual MB instead of encoding the EL original N-bit MB, with themotion vectors MV_(EL) obtained by motion estimation (ME) from the ELdata, and in particular from the current and previous EL residuals.

In the second EL Inter coding mode, the motion vectors for the EL areshared from the BL. ME and motion compensation (MC) are computationallycomplex, therefore this encoding method saves much processing power inthe EL encoder. By sharing the BL motion vectors, both the running timeof the encoder and the generated bitrate can be reduced. The BL motiondata are upsampled MV_(BLUp) and are used for the BL MC MCPred in thismode.

A flag base_mode_flag is the switch between the two new EL Inter codingmodes, which flag is also output together with the encoded BL and ELdata for correspondingly controlling the decoder.

A corresponding decoder is shown in FIG. 4. In the particular embodimentof FIG. 4 the BL residual is in addition spatially upsampled, usingresidual upsampling RUp before it is bit depth upsampled BDUp. A flagbase_mode_flag is detected in the incoming data stream and used tocontrol the decoding mode: if the flag has a first value, motioninformation extracted from the incoming EL data stream EL_(MI) is usedfor the EL branch. If the flag has another second value, upsampled MUpmotion information from the BL, which was extracted from the incomingdata EL stream and then upsampled, is used for the EL branch. Otherparts (image data) of the incoming BL data stream are inverse quantizedand inverse transformed and the resulting residual BL_(res,k) is used toconstruct the BL video (if required) and for upsampling (if EL video isrequired). In principle it is sufficient if the scalable decodergenerates either BL video or EL video, depending on the requirementsdefined by a user.

Two main advantages of the presented new coding modes of EL for colorbit depth scalable coding are: first, the new coding modes provide moremode options for the encoder, which is especially useful for RDO, sinceRDO has more choices then, and better optimization is possible.Secondly, with these new modes the inter-layer residual is encodeddirectly, and higher coding efficiency is achieved.

Thus, the invention can be used for scalable encoders, scalable decodersand scalable signals, particularly for video signals or other types ofsignals that have different quality layers and high inter-layerredundancy.

It will be understood that the present invention has been describedpurely by way of example, and modifications of detail can be madewithout departing from the scope of the invention. Each featuredisclosed in the description and (where appropriate) the claims anddrawings may be provided independently or in any appropriatecombination. Features may (where appropriate) be implemented inhardware, software, or a combination of the two. Reference numeralsappearing in the claims are by way of illustration only and shall haveno limiting effect on the scope of the claims.

1-11. (canceled)
 12. Method for encoding video data having a base layerand an enhancement layer, wherein the base layer has lower colorresolution than the enhancement layer, the method comprising the stepsof transforming and quantizing base layer data; inverse transforming andinverse quantizing the transformed and quantized base layer data,wherein reconstructed base layer data are obtained; upsampling thereconstructed base layer data, wherein the upsampling refers at least tobit depth and wherein a predicted version of enhancement layer data isobtained; generating a residual between original enhancement layer dataand the predicted version of enhancement layer data; selecting for thecase of inter coded enhancement layer between at least two differentinter coding modes, wherein a first inter coding mode comprises usingupsampled base layer motion information and a second inter coding modecomprises using motion information generated from said enhancement layerdata; encoding the transformed and quantized base layer data; andencoding said enhancement layer residual using the selected enhancementlayer encoding mode and encoding an indication indicating said encodingmode.
 13. Method according to claim 12, further comprising the step ofselecting for the case of intra coded enhancement layer between at leasttwo different intra coding modes, wherein at least one but not all ofthe intra coding modes comprises additional intra coding of saidresidual.
 14. Method according to claim 12, wherein the steps ofselecting between different coding modes comprise a step ofrate-distortion-optimization.
 15. Method according to claim 12, whereinthe step of upsampling also comprises spatial upsampling.
 16. Method fordecoding scalable video data having a base layer and an enhancementlayer, wherein the base layer has less bit depth than the enhancementlayer, comprising the steps of receiving quantized and transformedenhancement layer information and base layer information and a decodingmode indication; performing inverse quantization and inversetransformation on the received enhancement layer and base layerinformation, upsampling inverse quantized and inverse transformed baselayer information, wherein the bit depth per value is increased andwherein predicted enhancement layer information is obtained; andreconstructing from the predicted enhancement layer information and theinverse quantized and inverse transformed enhancement layer informationreconstructed enhancement layer video information, wherein a decodingmode according to said decoding mode indication is selected, whereinpossible decoding modes comprise a first mode, wherein in the case ofinter coded enhancement layer information the inverse quantized andinverse transformed enhancement layer information is decoded usingmotion information extracted from the enhancement layer information; anda second mode, wherein in the case of inter coded enhancement layerinformation the inverse quantized and inverse transformed enhancementlayer information is decoded using motion information extracted from thebase layer information.
 17. Method according to claim 16, wherein areconstructed enhancement layer residual is obtained, further comprisingthe step of adding the reconstructed enhancement layer residual toreconstructed, motion compensated enhancement layer information. 18.Method according to claim 16, wherein possible decoding modes furthercomprise a third mode, wherein in the case of intra coded enhancementlayer information the inverse quantized and inverse transformedenhancement layer information results in an enhancement layer residual;and a fourth mode, wherein in the case of intra coded enhancement layerinformation the inverse quantized and inverse transformed enhancementlayer information is intra decoded to obtain said enhancement layerresidual.
 19. Apparatus for encoding video data having a base layer andan enhancement layer, wherein the base layer has lower color resolutionthan the enhancement layer, comprising means for transforming andquantizing base layer data; means for inverse transforming and inversequantizing the transformed and quantized base layer data, whereinreconstructed base layer data are obtained; means for upsampling thereconstructed base layer data, wherein the upsampling refers at least tobit depth and wherein a predicted version of enhancement layer data isobtained; means for generating a residual between original enhancementlayer data and the predicted version of enhancement layer data; meansfor selecting for the case of inter coded enhancement layer between atleast two different inter coding modes, wherein a first inter codingmode comprises using upsampled base layer motion information and asecond inter coding mode comprises using motion information generatedfrom said enhancement layer data; means for encoding the transformed andquantized base layer data; and means for encoding said enhancement layerresidual using the selected enhancement layer encoding mode and meansfor encoding an indication indicating said encoding mode.
 20. Apparatusaccording to claim 19, wherein the means for upsampling comprises meansfor increasing the number of pixels and means for increasing the numberof values that each pixel can have.
 21. Apparatus for decoding videodata having a base layer and an enhancement layer, wherein the baselayer has lower color resolution than the enhancement layer, comprisingmeans for receiving quantized and transformed enhancement layerinformation and base layer information and a decoding mode indication;means for performing inverse quantization and inverse transformation onthe received enhancement layer and base layer information, means forupsampling inverse quantized and inverse transformed base layerinformation, wherein the bit depth per value is increased and whereinpredicted enhancement layer information is obtained; means forreconstructing from the predicted enhancement layer information and theinverse quantized and inverse transformed enhancement layer informationreconstructed enhancement layer video information; and means forselecting a decoding mode, wherein a decoding mode according to saiddecoding mode indication is selected, wherein possible decoding modescomprise a first mode, wherein in the case of inter coded enhancementlayer information the inverse quantized and inverse transformedenhancement layer information is decoded using motion informationextracted from the enhancement layer information, and a second mode,wherein in the case of inter coded enhancement layer information theinverse quantized and inverse transformed enhancement layer informationis decoded using motion information extracted from the base layerinformation.
 22. Apparatus according to claim 21, wherein the means forupsampling comprises means for increasing the number of pixels and meansfor increasing the number of values that each pixel can have. 23.Encoded scalable video signal comprising encoded base layer data,encoded enhancement layer data and a prediction type indication, whereinthe encoded enhancement layer data comprises a residual being thedifference between a bit depth upsampled base layer image and anenhancement layer image, the residual comprising differential bit depthinformation, and wherein the prediction type indication indicateswhether or not the decoder must perform spatial intra decoding on theenhancement layer data to re-obtain the residual that refers to said bitdepth upsampled base layer image.